42 research outputs found
Hypersparse Neural Network Analysis of Large-Scale Internet Traffic
The Internet is transforming our society, necessitating a quantitative
understanding of Internet traffic. Our team collects and curates the largest
publicly available Internet traffic data containing 50 billion packets.
Utilizing a novel hypersparse neural network analysis of "video" streams of
this traffic using 10,000 processors in the MIT SuperCloud reveals a new
phenomena: the importance of otherwise unseen leaf nodes and isolated links in
Internet traffic. Our neural network approach further shows that a
two-parameter modified Zipf-Mandelbrot distribution accurately describes a wide
variety of source/destination statistics on moving sample windows ranging from
100,000 to 100,000,000 packets over collections that span years and continents.
The inferred model parameters distinguish different network streams and the
model leaf parameter strongly correlates with the fraction of the traffic in
different underlying network topologies. The hypersparse neural network
pipeline is highly adaptable and different network statistics and training
models can be incorporated with simple changes to the image filter functions.Comment: 11 pages, 10 figures, 3 tables, 60 citations; to appear in IEEE High
Performance Extreme Computing (HPEC) 201
Lincoln AI Computing Survey (LAICS) Update
This paper is an update of the survey of AI accelerators and processors from
past four years, which is now called the Lincoln AI Computing Survey - LAICS
(pronounced "lace"). As in past years, this paper collects and summarizes the
current commercial accelerators that have been publicly announced with peak
performance and peak power consumption numbers. The performance and power
values are plotted on a scatter graph, and a number of dimensions and
observations from the trends on this plot are again discussed and analyzed.
Market segments are highlighted on the scatter plot, and zoomed plots of each
segment are also included. Finally, a brief description of each of the new
accelerators that have been added in the survey this year is included.Comment: 7 pages, 6 figures, 2023 IEEE High Performance Extreme Computing
(HPEC) conference, September 202
Parallel Vectorized Algebraic AES in MATLAB for Rapid Prototyping of Encrypted Sensor Processing Algorithms and Database Analytics
The increasing use of networked sensor systems and networked databases has
led to an increased interest in incorporating encryption directly into sensor
algorithms and database analytics. MATLAB is the dominant tool for rapid
prototyping of sensor algorithms and has extensive database analytics
capabilities. The advent of high level and high performance Galois Field
mathematical environments allows encryption algorithms to be expressed
succinctly and efficiently. This work leverages the Galois Field primitives
found the MATLAB Communication Toolbox to implement a mode of the Advanced
Encrypted Standard (AES) based on first principals mathematics. The resulting
implementation requires 100x less code than standard AES implementations and
delivers speed that is effective for many design purposes. The parallel version
achieves speed comparable to native OpenSSL on a single node and is sufficient
for real-time prototyping of many sensor processing algorithms and database
analytics.Comment: 6 pages; accepted to IEEE High Performance Extreme Computing
Conference (HPEC) 201
AI and ML Accelerator Survey and Trends
This paper updates the survey of AI accelerators and processors from past
three years. This paper collects and summarizes the current commercial
accelerators that have been publicly announced with peak performance and power
consumption numbers. The performance and power values are plotted on a scatter
graph, and a number of dimensions and observations from the trends on this plot
are again discussed and analyzed. Two new trends plots based on accelerator
release dates are included in this year's paper, along with the additional
trends of some neuromorphic, photonic, and memristor-based inference
accelerators.Comment: 10 pages, 4 figures, 2022 IEEE High Performance Extreme Computing
(HPEC) Conference. arXiv admin note: substantial text overlap with
arXiv:2009.00993, arXiv:2109.0895
Performance Measurements of Supercomputing and Cloud Storage Solutions
Increasing amounts of data from varied sources, particularly in the fields of
machine learning and graph analytics, are causing storage requirements to grow
rapidly. A variety of technologies exist for storing and sharing these data,
ranging from parallel file systems used by supercomputers to distributed block
storage systems found in clouds. Relatively few comparative measurements exist
to inform decisions about which storage systems are best suited for particular
tasks. This work provides these measurements for two of the most popular
storage technologies: Lustre and Amazon S3. Lustre is an open-source, high
performance, parallel file system used by many of the largest supercomputers in
the world. Amazon's Simple Storage Service, or S3, is part of the Amazon Web
Services offering, and offers a scalable, distributed option to store and
retrieve data from anywhere on the Internet. Parallel processing is essential
for achieving high performance on modern storage systems. The performance tests
used span the gamut of parallel I/O scenarios, ranging from single-client,
single-node Amazon S3 and Lustre performance to a large-scale, multi-client
test designed to demonstrate the capabilities of a modern storage appliance
under heavy load. These results show that, when parallel I/O is used correctly
(i.e., many simultaneous read or write processes), full network bandwidth
performance is achievable and ranged from 10 gigabits/s over a 10 GigE S3
connection to 0.35 terabits/s using Lustre on a 1200 port 10 GigE switch. These
results demonstrate that S3 is well-suited to sharing vast quantities of data
over the Internet, while Lustre is well-suited to processing large quantities
of data locally.Comment: 5 pages, 4 figures, to appear in IEEE HPEC 201
Lustre, Hadoop, Accumulo
Data processing systems impose multiple views on data as it is processed by
the system. These views include spreadsheets, databases, matrices, and graphs.
There are a wide variety of technologies that can be used to store and process
data through these different steps. The Lustre parallel file system, the Hadoop
distributed file system, and the Accumulo database are all designed to address
the largest and the most challenging data storage problems. There have been
many ad-hoc comparisons of these technologies. This paper describes the
foundational principles of each technology, provides simple models for
assessing their capabilities, and compares the various technologies on a
hypothetical common cluster. These comparisons indicate that Lustre provides 2x
more storage capacity, is less likely to loose data during 3 simultaneous drive
failures, and provides higher bandwidth on general purpose workloads. Hadoop
can provide 4x greater read bandwidth on special purpose workloads. Accumulo
provides 10,000x lower latency on random lookups than either Lustre or Hadoop
but Accumulo's bulk bandwidth is 10x less. Significant recent work has been
done to enable mix-and-match solutions that allow Lustre, Hadoop, and Accumulo
to be combined in different ways.Comment: 6 pages; accepted to IEEE High Performance Extreme Computing
conference, Waltham, MA, 201
Enabling On-Demand Database Computing with MIT SuperCloud Database Management System
The MIT SuperCloud database management system allows for rapid creation and
flexible execution of a variety of the latest scientific databases, including
Apache Accumulo and SciDB. It is designed to permit these databases to run on a
High Performance Computing Cluster (HPCC) platform as seamlessly as any other
HPCC job. It ensures the seamless migration of the databases to the resources
assigned by the HPCC scheduler and centralized storage of the database files
when not running. It also permits snapshotting of databases to allow
researchers to experiment and push the limits of the technology without
concerns for data or productivity loss if the database becomes unstable.Comment: 6 pages; accepted to IEEE High Performance Extreme Computing (HPEC)
conference 2015. arXiv admin note: text overlap with arXiv:1406.492
Lessons Learned from a Decade of Providing Interactive, On-Demand High Performance Computing to Scientists and Engineers
For decades, the use of HPC systems was limited to those in the physical
sciences who had mastered their domain in conjunction with a deep understanding
of HPC architectures and algorithms. During these same decades, consumer
computing device advances produced tablets and smartphones that allow millions
of children to interactively develop and share code projects across the globe.
As the HPC community faces the challenges associated with guiding researchers
from disciplines using high productivity interactive tools to effective use of
HPC systems, it seems appropriate to revisit the assumptions surrounding the
necessary skills required for access to large computational systems. For over a
decade, MIT Lincoln Laboratory has been supporting interactive, on-demand high
performance computing by seamlessly integrating familiar high productivity
tools to provide users with an increased number of design turns, rapid
prototyping capability, and faster time to insight. In this paper, we discuss
the lessons learned while supporting interactive, on-demand high performance
computing from the perspectives of the users and the team supporting the users
and the system. Building on these lessons, we present an overview of current
needs and the technical solutions we are building to lower the barrier to entry
for new users from the humanities, social, and biological sciences.Comment: 15 pages, 3 figures, First Workshop on Interactive High Performance
Computing (WIHPC) 2018 held in conjunction with ISC High Performance 2018 in
Frankfurt, German
Survey and Benchmarking of Machine Learning Accelerators
Advances in multicore processors and accelerators have opened the flood gates
to greater exploration and application of machine learning techniques to a
variety of applications. These advances, along with breakdowns of several
trends including Moore's Law, have prompted an explosion of processors and
accelerators that promise even greater computational and machine learning
capabilities. These processors and accelerators are coming in many forms, from
CPUs and GPUs to ASICs, FPGAs, and dataflow accelerators. This paper surveys
the current state of these processors and accelerators that have been publicly
announced with performance and power consumption numbers. The performance and
power values are plotted on a scatter graph and a number of dimensions and
observations from the trends on this plot are discussed and analyzed. For
instance, there are interesting trends in the plot regarding power consumption,
numerical precision, and inference versus training. We then select and
benchmark two commercially-available low size, weight, and power (SWaP)
accelerators as these processors are the most interesting for embedded and
mobile machine learning inference applications that are most applicable to the
DoD and other SWaP constrained users. We determine how they actually perform
with real-world images and neural network models, compare those results to the
reported performance and power consumption values and evaluate them against an
Intel CPU that is used in some embedded applications.Comment: 9 pages, 3 figures, IEEE-HPEC conference, Waltham, MA, September
24-26, 201